Applying recursion to serial and parallel QR factorization leads to better performance
نویسندگان
چکیده
We present new recursive serial and parallel algorithms for QR factorization of an m by n matrix. They improve performance. The recursion leads to an automatic variable blocking, and it also replaces a Level 2 part in a standard block algorithm with Level 3 operations. However, there are significant additional costs for creating and performing the updates, which prohibit the efficient use of the recursion for large n. We present a quantitative analysis of these extra costs. This analysis leads us to introduce a hybrid recursive algorithm that outperforms the LAPACK algorithm DGEQRF by about 20% for large square matrices and up to almost a factor of 3 for tall thin matrices. Uniprocessor performance results are presented for two IBM RS/6000 SP nodes—a 120-MHz IBM POWER2 node and one processor of a four-way 332-MHz IBM PowerPC 604e SMP node. The hybrid recursive algorithm reaches more than 90% of the theoretical peak performance of the POWER2 node. Compared to standard block algorithms, the recursive approach also shows a significant advantage in the automatic tuning obtained from its automatic variable blocking. A successful parallel implementation on a four-way 332-MHz IBM PPC604e SMP node based on dynamic load balancing is presented. For two, three, and four processors it shows speedups of up to 1.97, 2.99, and 3.97.
منابع مشابه
High-Performance Library Software for QR Factorization
In 5, 6], we presented algorithm RGEQR3, a purely recur-sive formulation of the QR factorization. Using recursion leads us to a natural way to choose the k-way aggregating Householder transform of Schreiber and Van Loan 10]. RGEQR3 is a performance critical sub-routine for the main (hybrid recursive) routine RGEQRF for QR fac-torization of a general m n matrix. This contribution presents a new ...
متن کاملA high-performance algorithm for the linear least squares problem on SMP systems
We present new recursive serial and parallel algorithms for the linear least squares problem AX = B, where A is m by n, m n. The algorithms improve performance. This work is an extension of our work on QR factorization 4]. The key idea is to combine the computation of Q T B with the QR factorization, thereby saving computations compared to the standard LAPACK algorithm. Recursion allows us to r...
متن کاملNew Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems
We present a new recursive algorithm for the QR factoriza-tion of an m by n matrix A. The recursion leads to an automatic variable blocking that allow us to replace a level 2 part in a standard block algorithm by level 3 operations. However, there are some additional costs for performing the updates which prohibits the eecient use of the recursion for large n. This obstacle is overcome by using...
متن کاملParallel Algorithms for Toeplitz Systems
We describe some parallel algorithms for the solution of Toeplitz linear systems and Toeplitz least squares problems. First we consider the parallel implementation of the Bareiss algorithm (which is based on the classical Schur algorithm). The alternative Levinson algorithm is less suited to parallel implementation because it involves inner products. The Bareiss algorithm computes the LU factor...
متن کاملEnhancing Parallelism of Tile QR Factorization for Multicore Architectures
To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist of scheduling a Directed Acyclic Graph (DAG) of fine granularity tasks where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on mod...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IBM Journal of Research and Development
دوره 44 شماره
صفحات -
تاریخ انتشار 2000